Hide code cell source
# pip install dash
Hide code cell source
import pandas as pd
import plotly.graph_objs as go
import plotly.express as px
import matplotlib.pyplot as plt
import numpy as np

import dash
from dash import dcc
from dash import html
<<<<<<< Updated upstream
=======
>>>>>>> Stashed changes

Correlation between Happiness and Economic Factors#

01-07-2023

Information Visualization: data story final

Group: B4

Student name

student number

Evan Lont

14729210

Joep Haanen

14657368

Lotte te Kulve

14648911

Robin Kuipers

14273810

Introduction#

Over the last few years, a lot has happened in the world. From the end of 2019 to the first half of 2022, the world went through a global pandemic. During and after the pandemic, the inflation rates skyrocketed to record-breaking numbers. The inflation had not been this high in almost 40 years (OECD Economic Outlook, 2023). Additionally, at the beginning of 2022, a war between Russia and Ukraine broke out. All of these events could have a significant influence on world happiness rate.

The analysis will focus on the correlation between the world happiness rate and economic factors.

We have decided to focus on the aspect of inflation for the economic factors. This is mainly due to our own experience with inflation and that of our environment. In the past few years, we have heard a lot about the problems around inflation and the potential risks of an ever-increasing inflation rate. This has been broadcasted on the news, show in newspapers but most obviously seen in our own economic environment. We have noticed ourselves that all our expenses have gone up. Groceries have become more expensive, restaurants have become more expensive, and even basic needs like a haircut have seen an enormous increase in cost over the past years. Inflation has been an important topic of conversation that we all deal with. This is why we have set our focus on this topic and its correlation with happiness of the people around the world.

The “World Happiness Report” dataset and relevant economic indicators such as GDP per capita, inflation rates, and consumer price index (CPI) will be used to investigate the relationship between subjective well-being and economic stability. Through data analysis, the aim is to determine whether countries with higher economic indicators tend to exhibit higher happiness scores. This study aims to contribute to understanding how economic factors influence levels of happiness at both individual and societal levels.

Datasets and preprocessing#

For the first dataset, the World Happiness Report Dataset from the Sustainable Development Solutions Network, powered by the Gallup World Poll data, has been chosen. As for the second dataset, an inflation dataset from OECD data that covers at least, ten years up until 2022 has been identified to meet our requirements. Upon analyzing the two datasets, it became clear that the datasets needed some filtering. Additionally, the inflation dataset offers the potential for intriguing visualizations due to the inclusion of inflation trends before, during, and to some extent, after the COVID-19 pandemic.

Dataset 1: World happiness report#

Source: https://worldhappiness.report/ed/2020/#appendices-and-data

Number of records: 20

Number of variables: 10

Description: As part of our data analysis, we utilized two datasets from the World Happiness Report for the years 2020 and 2022. The WHR is an annual publication made by the Sustainable Development Solutions Network, and relies on data collected by the Gallup World Poll. The report is written by a group of independent experts, each with expertise in different variables that the WHR measures. It covers these variables over more than 150 countries worldwide, of which we have chosen to analyze eight specific countries. The primary objective of the yearly report is to reflect a worldwide demand for more attention towards happiness by inspiring countries’ governments to take on a better government policy. During our analysis we will work with the variables of our eight chosen countries in order to make findings about the relationship between the happiness score and several economic factors. These variables include ones found inside the WHR, such as GDP per capita and generosity, but also external variables such as the yearly inflation.

Variable

Datatype

Measurement scale

country name

Categorical

Nominal

Regional indicator

Categorical

Nominal

Happiness score

Continuous

Interval

upperwhisker

Continuous

Interval

lowerwhisker

Continuous

Interval

Logged GDP per capita

Continuous

Ratio

Healthy life expectancy

Continuous

Interval

Generosity

Continuous

Interval

Perceptions of corruption

Continuous

Interval

Explained by: Log GDP per capita

Continuous

Ratio

Explained by: Healthy life expectancy

Continuous

Ratio

Explained by: Freedom to make life choices

Continuous

Ratio

Explained by: Generosity

Continuous

Ratio

Explained by: Social support

Continuous

Ratio

Explained by: Perceptions of corruption

Continuous

Ratio

Dystopia + residual

Continuous

Interval

Preprocessing#

For detailed preprocessing, visit: happiness data preprocessing

For each variable we asked ourselves the following questions:

  • What are the variables in the data?

  • Do we need all the data points and variables?

  • Are there data that are out of scope?

  • Are there privacy or ethical issues in the data?

  • Is it practical to process the variable that we want?

  • To prevent the dataset to be too large, the focus of the project will lay on the data for the years 2020 and 2022, because some of the datasets values varied a lot in between these years. Another reason for the selection of only two different years is that we want to find out how much the data can differ in such a small timeframe. The analysis will use the variables of our ten chosen countries in order to make findings about the relationship between the happiness score and several economic factors. These variables include ones found inside the WHR, such as GDP per capita and generosity, but also external variables such as the yearly inflation.

Based on the requirements for the data, the following actions were taken:

  • The removal of specific columns from the world happiness dataset, including:

    • Regional indicator

    • Upperwhisker

    • Lowerwhisker

  • Rearranging the columns to facilitate clear identification of the country and year under consideration.

  • Selecting and retaining only the countries necessary for our analysis, while removing the rest. The final selection includes: ‘Switzerland’, ‘Netherlands’, ‘New Zealand’, ‘Canada’,’Saudi Arabia’, ‘Chile’, ‘Portugal’, ‘China’, ‘South Africa’, ‘India’. We chose these countries because they’re located in different regions and their economic wellbeing differs a lot.

Hide code cell source
<<<<<<< Updated upstream
happiness_2020 = pd.read_csv('happiness_2020-def.csv')
=======
happiness_2020 = pd.read_csv('happiness_2020.csv')
>>>>>>> Stashed changes
pd.DataFrame.head(happiness_2020, n=5)
Unnamed: 0 Country name Happiness score Dystopia + residual Explained by: Log GDP per capita Explained by: Social support Explained by: Healthy life expectancy Explained by: Freedom to make life choices Explained by: Generosity Explained by: Perceptions of corruption
0 2 CHE 7.5599 2.350267 1.390774 1.472403 1.040533 0.628954 0.269056 0.407946
1 5 NLD 7.4489 2.352117 1.338946 1.463646 0.975675 0.613626 0.336318 0.368570
2 7 NZL 7.2996 2.128108 1.242318 1.487218 1.008138 0.646790 0.325726 0.461268
3 10 CAN 7.2321 2.195269 1.301648 1.435392 1.022502 0.644028 0.281529 0.351702
4 26 SAU 6.4065 2.203119 1.334329 1.309950 0.759818 0.548477 0.087441 0.163322
Hide code cell source
<<<<<<< Updated upstream
happiness_2022 = pd.read_csv('happiness_2022-def.csv')
pd.DataFrame.head(happiness_2022, n=10)
=======
happiness_2022 = pd.read_csv('happiness_2022.csv')
pd.DataFrame.head(happiness_2022, n=5)
>>>>>>> Stashed changes
Unnamed: 0 Country Happiness score Dystopia (1.83) + residual Explained by: GDP per capita Explained by: Social support Explained by: Healthy life expectancy Explained by: Freedom to make life choices Explained by: Generosity Explained by: Perceptions of corruption
0 3 CHE 7.512 2.153 2.026 1.226 0.822 0.677 0.147 0.461
1 4 NLD 7.415 2.137 1.945 1.206 0.787 0.651 0.271 0.419
2 9 NZL 7.200 1.954 1.852 1.235 0.752 0.680 0.245 0.483
3 14 CAN 7.025 1.924 1.886 1.188 0.783 0.659 0.217 0.368
4 24 SAU 6.523 2.075 1.870 1.092 0.577 0.651 0.078 0.180

Dataset 2: Inflation (CPI)#

Source: https://data.oecd.org/price/inflation-cpi.htm

Number of records: 490

Number of variables: 8

Description: The “Inflation (CPI)” dataset from the OECD contains information on consumer price index (CPI) and inflation rates across various countries. It provides a comprehensive view of the changes in price levels for goods and services over time, allowing for the analysis and comparison of inflation rates among different economies. The dataset includes indicators such as headline inflation, core inflation, and various sub-components of CPI. It serves as a valuable resource for understanding and monitoring inflation trends at a global level.

Variable

Datatype

Measurement scale

Location

Categorical

Nominal

Regional indicator

Categorical

Nominal

Subject

categorical

Nominal

Measure

categorical

Interval

Frequency

Continuous

Interval

Time

Continuous

Interval

Value

Continuous

Interval

Flag code

Categorical

Nominal

Preprocessing#

For detailed preprocessing, visit: inflation data preprocessing

  • Country names were changed to abbreviations.

  • Both datasets contained information per country, but the inflation dataset used abbreviations as values while the happiness dataset used full country names. To facilitate data comparison for specific countries, we needed to align the values either to abbreviations or full country names. We decided to use abbreviations for consistency.

Hide code cell source
inflation = pd.read_csv('inflation.csv')
# inflation.drop('Flag Codes', axis=1, inplace=True)
# inflation.drop('FREQUENCY', axis=1, inplace=True)
inflation2020 = inflation[inflation['TIME'] == 2020]
inflation2022 = inflation[inflation['TIME'] == 2022]
pd.DataFrame.head(inflation, n=5)
Unnamed: 0 LOCATION INDICATOR SUBJECT MEASURE TIME Value
0 146211 CAN CPI TOT IDX2015 2020 108.2104
1 146213 CAN CPI TOT IDX2015 2022 119.4957
2 149430 NLD CPI TOT IDX2015 2020 107.5100
3 149432 NLD CPI TOT IDX2015 2022 121.4267
4 149731 NZL CPI TOT IDX2015 2020 107.6488
<<<<<<< Updated upstream

Dataset 2: Inflation (CPI)#

Source: https://data.oecd.org/price/inflation-cpi.htm

Number of records: 490

Number of variables: 8

Description: The “Inflation (CPI)” dataset from the OECD contains information on consumer price index (CPI) and inflation rates across various countries. It provides a comprehensive view of the changes in price levels for goods and services over time, allowing for the analysis and comparison of inflation rates among different economies. The dataset includes indicators such as headline inflation, core inflation, and various sub-components of CPI. It serves as a valuable resource for understanding and monitoring inflation trends at a global level.

Variable

Datatype

Measurement scale

Location

Categorical

Nominal

Regional indicator

Categorical

Nominal

Subject

categorical

Nominal

Measure

categorical

Interval

Frequency

Continuous

Interval

Time

Continuous

Interval

Value

Continuous

Interval

Flag code

Categorical

Nominal

Preprocessing#

For detailed preprocessing, visit: inflation data preprocessing

  • Country names were changed to abbreviations.

  • Both datasets contained information per country, but the inflation dataset used abbreviations as values while the happiness dataset used full country names. To facilitate data comparison for specific countries, we needed to align the values either to abbreviations or full country names. We decided to use abbreviations for consistency.

======= >>>>>>> Stashed changes
Hide code cell source
<<<<<<< Updated upstream
inflation = pd.read_csv('inflation.csv')
pd.DataFrame.head(inflation, n=5)
LOCATION INDICATOR SUBJECT MEASURE FREQUENCY TIME Value Flag Codes
0 AUS CPI FOOD AGRWTH A 2018 0.670376 NaN
1 AUS CPI FOOD AGRWTH A 2019 4.482894 NaN
2 AUS CPI FOOD AGRWTH A 2020 9.320118 NaN
3 AUS CPI FOOD AGRWTH A 2021 7.909739 NaN
4 AUS CPI FOOD AGRWTH A 2022 8.166700 NaN
Hide code cell source
inflation = pd.read_csv('inflation.csv')
happiness_2020 = pd.read_csv('happiness_2020-def.csv')
happiness_2022 = pd.read_csv('happiness_2022-def.csv')
inflation.drop('Flag Codes', axis=1, inplace=True)
inflation.drop('FREQUENCY', axis=1, inplace=True)
Hide code cell source
# list all unique country names
unique_countries = pd.unique(happiness_2020['Country name'])

# list all unique abbreviations
unique_abbr = pd.unique(inflation['LOCATION'])

# map all unique country names in a dictionary with abbreviations as values
country_mapping = {
    "Switzerland": "CHE",
    "Netherlands": "NLD",
    "New Zealand": "NZL",
    "Canada": "CAN",
    "Saudi Arabia": "SAU",
    "Chile": "CHL",
    "Japan": "JPN",
    "Portugal": "PRT",
    "China": "CHN",
    "South Africa": "ZAF",
    "India": "IND"
}

# map the dictionary to the values of 'country name' in the happiness dataset
happiness_2020['Country name'] = happiness_2020['Country name'].map(country_mapping)
happiness_2020.head()

# export to csv
#happiness_2020.to_csv('happiness_2020.csv', index=False)
Unnamed: 0 Country name Happiness score Dystopia + residual Explained by: Log GDP per capita Explained by: Social support Explained by: Healthy life expectancy Explained by: Freedom to make life choices Explained by: Generosity Explained by: Perceptions of corruption
0 2 NaN 7.5599 2.350267 1.390774 1.472403 1.040533 0.628954 0.269056 0.407946
1 5 NaN 7.4489 2.352117 1.338946 1.463646 0.975675 0.613626 0.336318 0.368570
2 7 NaN 7.2996 2.128108 1.242318 1.487218 1.008138 0.646790 0.325726 0.461268
3 10 NaN 7.2321 2.195269 1.301648 1.435392 1.022502 0.644028 0.281529 0.351702
4 26 NaN 6.4065 2.203119 1.334329 1.309950 0.759818 0.548477 0.087441 0.163322
Hide code cell source
inflation2020 = inflation[inflation['TIME'] == 2020]
inflation2022 = inflation[inflation['TIME'] == 2022]
=======
# This code was used in the data cleaning process, more data cleaning code in datacleaning.ipynb

# # list all unique country names
# unique_countries = pd.unique(happiness_2020['Country name'])

# # list all unique abbreviations
# unique_abbr = pd.unique(inflation['LOCATION'])

# # map all unique country names in a dictionary with abbreviations as values
# country_mapping = {
#     "Switzerland": "CHE",
#     "Netherlands": "NLD",
#     "New Zealand": "NZL",
#     "Canada": "CAN",
#     "Saudi Arabia": "SAU",
#     "Chile": "CHL",
#     "Japan": "JPN",
#     "Portugal": "PRT",
#     "China": "CHN",
#     "South Africa": "ZAF",
#     "India": "IND"
# }

# # map the dictionary to the values of 'country name' in the happiness dataset
# happiness_2020['full Country name'] = happiness_2020['Country name'].map(country_mapping)
# happiness_2020.head()

# # export to csv
# #happiness_2020.to_csv('happiness_2020.csv', index=False)
>>>>>>> Stashed changes

Perspective 1: Inflation has a minimal impact on happiness.#

While inflation is an important economic indicator, its influence on happiness might be overshadowed by other factors. This perspective suggests that while economic stability is crucial, it may not be the sole determinant of happiness. To see if this perspective is valid, three visualisations have been created.

The first visualisation illustrates the increase the inflation between the years 2020 and 2022 per selected country. The lines in the graph represent the increase in inflation for the different countries. In the visualisation can be seen how for every counrty the inflation has increased in 2022. The graph also shows how high the inflation rates are in comparison with the inflation in 2015. The year 2015 got the value of 100, so an inflation rate of 130 means that the inflation got 30% higher in that year in comparison to 2015.

Hide code cell source
colors = ['rgb(102,194,165)', 'rgb(252,141,98)', 'rgb(141,160,203)']


layout = go.Layout(
    xaxis=go.layout.XAxis(
        type='category',  # The x-axis type is categorical
        tickvals=['2020', '2022'],  # Set custom tick values
        ticktext=['2020', '2022'],  # Set custom tick labels
    ),

    width=600,
    height=600
)

data = []
for country in inflation2020['LOCATION'].unique():
    # Extract the data for each country
    country_data_2020 = inflation2020[inflation2020['LOCATION'] == country]
    country_data_2022 = inflation2022[inflation2022['LOCATION'] == country]
    
    # Create a trace for each country
    trace = go.Scatter(
        x=['2020', '2022'],
        y=[country_data_2020['Value'].iloc[0], country_data_2022['Value'].iloc[0]],
        mode='lines+markers',
        name=country,
#         
    )
    
    data.append(trace)


fig = go.Figure(data=data, layout=layout)


fig.update_layout(
    title="Inflation Rates by Country with the year 2015 as inflation rate 100",
    xaxis_title="Year",
    yaxis_title="Inflation Rate",
)


fig.show()
<<<<<<< Updated upstream
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[8], line 24
     19     country_data_2022 = inflation2022[inflation2022['LOCATION'] == country]
     21     # Create a trace for each country
     22     trace = go.Scatter(
     23         x=['2020', '2022'],
---> 24         y=[country_data_2020['Value'].iloc[0], country_data_2022['Value'].iloc[0]],
     25         mode='lines+markers',
     26         name=country,
     27 #         
     28     )
     30     data.append(trace)
     33 fig = go.Figure(data=data, layout=layout)

File ~/anaconda3/lib/python3.10/site-packages/pandas/core/indexing.py:1073, in _LocationIndexer.__getitem__(self, key)
   1070 axis = self.axis or 0
   1072 maybe_callable = com.apply_if_callable(key, self.obj)
-> 1073 return self._getitem_axis(maybe_callable, axis=axis)

File ~/anaconda3/lib/python3.10/site-packages/pandas/core/indexing.py:1625, in _iLocIndexer._getitem_axis(self, key, axis)
   1622     raise TypeError("Cannot index by location index with a non-integer key")
   1624 # validate the location
-> 1625 self._validate_integer(key, axis)
   1627 return self.obj._ixs(key, axis=axis)

File ~/anaconda3/lib/python3.10/site-packages/pandas/core/indexing.py:1557, in _iLocIndexer._validate_integer(self, key, axis)
   1555 len_axis = len(self.obj._get_axis(axis))
   1556 if key >= len_axis or key < -len_axis:
-> 1557     raise IndexError("single positional indexer is out-of-bounds")

IndexError: single positional indexer is out-of-bounds
=======
>>>>>>> Stashed changes